Goto

Collaborating Authors

 quechua language


Quechua Speech Datasets in Common Voice: The Case of Puno Quechua

Huaman, Elwin, Huaman, Wendi, Huaman, Jorge Luis, Quispe, Ninfa

arXiv.org Artificial Intelligence

Under-resourced languages, such as Quechuas, face data and resource scarcity, hindering their development in speech technology. To address this issue, Common Voice presents a crucial opportunity to foster an open and community-driven speech dataset creation. This paper examines the integration of Quechua languages into Common Voice. We detail the current 17 Quechua languages, presenting Puno Quechua (ISO 639-3: qxp) as a focused case study that includes language onboarding and corpus collection of both reading and spontaneous speech data. Our results demonstrate that Common Voice now hosts 191.1 hours of Quechua speech (86\% validated), with Puno Quechua contributing 12 hours (77\% validated), highlighting the Common Voice's potential. We further propose a research agenda addressing technical challenges, alongside ethical considerations for community engagement and indigenous data sovereignty. Our work contributes towards inclusive voice technology and digital empowerment of under-resourced language communities.


A primer on getting neologisms from foreign languages to under-resourced languages

Camacho, Luis

arXiv.org Artificial Intelligence

Neologisms are certain uses, expressions, and words that did not traditionally exist in a language, but are incorporated into it due to the need of speakers to adapt to a new reality [1]. That is, neologisms are those new words and expressions that speakers incorporate into a language, as new things and new ways of doing to name arise. They are the exact opposite of archaisms. The appearance of neologisms is a common and ordinary process in all languages, forced as they are to adapt and update or die. However, a word can be considered a neologism only for a certain time, since once it has been incorporated and normalized as part of the language, it simply ceases to be a novelty. The simplest way to classify neologisms would be from the method used to create them, thus we have: 1. morphological neologisms: they are built using words that already exist in the language, through the processes of composition or derivation. For example, the word "aircraft" was once a neologism, made up of the prefix "air" and the suffix "craft". This also happens with "teleoperators" or with "biosecurity".


Getting Quechua Closer to Final Users through Knowledge Graphs

Huaman, Elwin, Huaman, Jorge Luis, Huaman, Wendi

arXiv.org Artificial Intelligence

Quechua language and Quechua knowledge gather millions of people around the world, especially in several countries in South America. Unfortunately, there are only a few resources available to Quechua communities, and they are mainly stored in PDF format. In this paper, the Quechua Knowledge Graph is envisioned and generated as an effort to get Quechua closer to the Quechua communities, researchers, and technology developers. Currently, there are 553636 triples stored in the Quechua Knowledge Graph, which is accessible on the Web, retrievable by machines, and curated by users. To showcase the deployment of the Quechua Knowledge Graph, use cases and future work are described.